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ABSTRACT 

Homozygosity mapping is a common method to 
map recessive traits in consanguineous families. 
To facilitate these analyses, we have developed 
HomozygosityMapper, a web-based approach to 
homozygosity mapping. HomozygosityMapper 
allows researchers to directly upload the genotype 
files produced by the major genotyping platforms 
as well as deep sequencing data. It detects 
stretches of homozygosity shared by the affected 
individuals and displays them graphically. Users 
can interactively inspect the underlying genotypes, 
manually refine these regions and eventually submit 
them to our candidate gene search engine 
GeneDistiller to identify the most promising candi- 
date genes. Here, we present the new version of 
HomozygosityMapper. The most striking new 
feature is the support of Next Generation 
Sequencing *.vcf files as input. Upon users' 
requests, we have implemented the analysis of 
common experimental rodents as well as of import- 
ant farm animals. Furthermore, we have extended 
the options for single families and loss of heterozy- 
gosity studies. Another new feature is the export of 
*.bed files for targeted enrichment of the potential 
disease regions for deep sequencing strategies. 
HomozygosityMapper also generates files for 
conventional linkage analyses which are already 
restricted to the possible disease regions, hence 
superseding CPU-intensive genome-wide analyses. 
HomozygosityMapper is freely available at http:// 
www.homozygositymapper.org/. 



INTRODUCTION 

Linkage analysis is still widely considered the 'gold 
standard' for disease gene mapping. However, especially 
in complex consanguineous families, these analyses 
require high-performance computers. Even then, a 
multi-point analysis of a medium-sized genotyping array 
with 100000 single nucleotide polymorphisms (SNPs) may 
take weeks. In one of our benchmarking experiments, the 
analysis of 50000 markers in a single consanguineous 
family needed more than 12 weeks to complete. 
Although it is possible to employ only a subset of a few 
thousand markers in the initial analysis and to re-analyse 
only the homozygous regions with the complete marker 
set, such an analysis can still take several hours. 
To overcome the restraints posed by linkage software, 
we have developed HomozygosityMapper (1), a 
web-based approach to homozygosity mapping. 

As the basic concept of homozygosity mapping is to 
trace the inheritance of the same chromosomal region 
from an ancestor via two consanguineous heterozygous 
parents and hence homozygosity in the patients, the 
disease region must be homozygous in all affected family 
members. It is thus not necessary to waste CPU time on 
a lengthy whole genome multipoint linkage analysis only 
to search for homozygous regions in the patients. Several 
applications (1-5), including HomozygosityMapper, 
therefore simply detect homozygous stretches in the 
patients and score them according to their length. In 
contrast to the other tools, HomozygosityMapper is 
entirely web-based so that no software installation is 
required. Users can upload their genotype files into our 
database without the need for reformatting, define the 
samples that represent affected or healthy individuals 
and immediately start the search for homozygosity. 
Further information such as marker positions or allele 
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frequencies is already stored in the database. After the 
analysis, the genome-wide homozygosity is plotted 
against the marker coordinates, interesting regions are 
highlighted in the plot and listed in a table (Figure 1). 
The application also offers the users the ability to 
inspect single chromosomes. Furthermore, the underlying 
genotypes can be displayed in a colour-coded matrix plot 
that highlights long homozygous regions (Figure 2). 



The entire process of upload, analysis and display is 
completed within 5min for a 50 K genotyping project 
with six samples; arrays featuring one million SNPs are 
completed in less than 30min. 

HomozygosityMapper provides various links to our 
candidate gene search engine GeneDistiller (7). It has 
hence become very convenient to proceed from the 
genotype file to the search for candidate genes. The 
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Figure 1. Genome-wide homozygosity. This screen shot shows the genome-wide homozygosity scores produced by HomozygosityMapper. These are 
plotted as a bar chart with red bars indicating the most promising genomic regions. Clicking on a bar will zoom into the chromosome. Above the bar 
chart, the excess or shortage of homozygous genotypes in cases versus controls is depicted. Below the figure, direct links to the most interesting 
regions are given and data export possibilities are provided. All figures depict the Carpenter syndrome study (6). 



f¥518 Nucleic Acids Research, 2012, Vol. 40, Web Server issue 

Example CS - Carpenter syndrome 
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Figure 2. Genotypes view. HomozygosityMapper also displays the genotypes of all samples. Here, the markers are placed on the x-axis while the 
samples are on the j-axis, with the patients on top and with red IDs. Genotypes are colour-coded: grey, unknown, blue, heterozygous, red, 
homozygous stretches (colour saturation reflects the length of the stretch). This figure also reveals the presence of a single heterozygous marker 
within the homozygous region (possibly a genotyping error and ignored by HomozygosityMapper). The patient on the bottom is from another family 
than the first two and does not share the same haplotype over the entire homozygous stretch. This can be seen from the genotypes with the diagonal 
bar indicating the less abundant of the homozygous genotypes. Users are free to change the boundaries of the region and can subsequently submit 
this region to GeneDistiller. 



process is so simple and intuitive that clinicians or re- 
searchers are able to analyse their data on their own 
without the need to consult dedicated IT specialists. 

The advent of Next Generation Sequencing (NGS) 
approaches has recently shifted the strategy to identify 
the disease mutation from sequencing candidate genes 
(8) to sequencing entire chromosomal regions (9). We 
expect that due to the falling costs, NGS will increasingly 
be applied no longer only for mutation detection after a 
linkage analysis but indeed as a single method to combine 
both (10). 

Here, we present the novel version of 
HomozygosityMapper that generates the *.bed files 
needed to include all possible disease regions in targeted 
enrichment deep sequencing. The software also covers the 
opposite approach, i.e. a fast homozygosity mapping on 
whole genome or whole exome sequencing data produced 
before a linkage analysis. According to our users' wishes, 
we have extended the software to handle other species 
than humans (11). Our database currently includes the 
most common model organisms and farm animals but 
can be extended to other species in short time. 
Furthermore, we have refined the options to handle 
single families and loss of heterozygosity studies. 

A detailed description of the approach, the features of 
HomozygosityMapper and data on its performance is 
provided in the original publication (1) and on our 
website. 



CHANGES IN THE NEW VERSION 

Since its introduction, HomozygosityMapper has become 
a widely used tool for homozygosity mapping (8,9,1 1-15). 
By winter 201 1-2012, 8 billion genotypes, created in more 
than 3300 mapping projects, have been permanently 
stored in our database. As users are free to use the 
software without registration and to delete their data 
after the analysis, the actual number of analysed pro- 
jects is probably even higher. Given the large number 
of users, we received numerous suggestions how to 
improve our software. These ranged from bug reports 
to completely new features. In the new version of 
HomozygosityMapper, we have integrated many of these. 



Import of NGS genotypes 

The most striking new feature of HomozygosityMapper is 
the integration of NGS data. Users can now directly 
upload the Variant Call Format (*.vcf) files generated in 
NGS projects. HomozygosityMapper will then pick all 
positions in which either variations from the RefSeq are 
found or which are known to bear SNPs and store the 
genotypes. We provide a description of the import file 
format and the generation of these files with SAMtools 
(16) on our website. 

Integration of additional species 

As homozygosity mapping is also employed by animal 
breeders and by researchers working on model organisms, 
we had several requests for animal versions. An early 
prototype was successfully employed in the mapping of 
generalised progressive retinal atrophy in dogs (11). We 
have now integrated a framework to include an unlimited 
number of species into the application on short notice. 
So far, data for seven different model organisms or 
breeding animals (humans, cattle, dogs, horses, mice, 
rats and sheep) are stored in our database. We encourage 
our users to contact us requesting the integration of 
other species. 

Optimisation of single family approaches 

The original release of HomozygosityMapper was aimed 
at the classic setting for homozygosity mapping with cases 
from different consanguineous families where genetic 
homogeneity of the disease was not absolutely sure. 
In single families, however, affected individuals should 
carry the same disease haplotype. We have now included 
an option to require genetic homogeneity around the 
disease locus. 

Users can also decide to include the genetic information 
of healthy siblings to further narrow down the disease 
region considerably. In contrast to the standard settings, 
this approach does not only search for regions that are 
homozygous in many affected individuals but also for 
a homozygous disease haplotype shared by all affected 
individuals which must not be present in any of the 
healthy controls. 
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Optimisation of loss of heterozygosity studies and genetic 
heterogeneity 

Studies for loss of heterozygosity (or microdeletions) are 
often performed on large cohorts of unrelated subjects 
with similar phenotypes. Here, a high degree of heterogen- 
eity is possible (17,18). We have therefore added an 
option to exclude very short homozygous regions 
because these may occur due to uninformative makers 
and also by chance. In this mode, the search focuses on 
long regions only shared by some patients. This approach 
is also useful when genetic heterogeneity is expected in 
a classic homozygosity mapping setting, e.g. with 
patients from different regions or with slightly different 
phenotypes. 

Improved transition to GeneDistiller 

We have optimised the transition between the identifica- 
tion of homozygous stretches in the genome and the 
analysis of potential candidate genes therein. Users can 
now seamlessly switch to the candidate gene search at 
any step after the analysis. They can search for homozy- 
gosity around candidate genes, query the genes contained 
in all interesting regions at once or view the genotypes 
within a single region, refine the region and search 
among the genes it contains. 

File export for linkage analysis 

As homozygosity is the a priori condition for homozy- 
gosity mapping, it is not necessary to perform a 
CPU-intensive whole genome linkage analysis when 
searching for homozygosity in consanguineous families. 
The search for linkage can be restricted to longer stretches 
of homozygosity shared by all or at least many patients. 
HomozygosityMapper now offers the export of genotype 
and map files for the potential disease regions in the 
format used by ALOHOMORA (19). ALOHOMORA is 
a tool that converts SNP genotypes and marker informa- 
tion into the input files required by common linkage 
analysis software. Using this approach, a multipoint 
linkage analysis can be restricted to the possible disease 
regions thus sparing considerable CPU time. 

File export for NGS 

With the advent of NGS technologies, it has become 
feasible to search for mutations in numerous genes or 
even in complete linkage intervals in one run. In the case 
of homozygosity mapping, all genes that are positional 
candidates, i.e. located in homozygous regions, thus can 
be sequenced simultaneously. HomozygosityMapper can 
now generate the *.bed files needed for the targeted DNA 
capture of the possible disease regions. The files can either 
cover (a) all homozygous regions completely, (b) only the 
genes contained within them or (c) only the exons plus 
a user-defined flanking region. 

Performance 

Because of the increasing use of HomozygosityMapper 
with very large datasets (the most prominent were 200 
million genotypes from 1000 samples genotyped with 



Affymetrix Axiom chips), we have restructured our 
database for a better performance on huge datasets, 
mainly by adding further indices, a different commit 
strategy and an adapted configuration. We have recently 
acquired a new RAID system that will further increase 
query speed. 

Privacy 

The original version of HomozygosityMapper required 
users to login to create projects only visible to themselves. 
We have now added the possibility to make a project 
private without a user account. In these cases, a secret 
key is issued when the genotypes are uploaded. Such 
projects can only been accessed with this key and they 
are not displayed in any lists. However, in these cases, 
users lose access to their data if they lose their key. 
Of course, this key can be shared with collaborators but 
this will grant them unlimited access to the data. 

New data import formats 

Besides the integration of *.vcf files, we have extended the 
possible genotyping arrays to allow the import of geno- 
types generated on recent arrays such as the Affymetix 
Axiom family. We have also adapted the import routine 
so that further file formats (some of which were in-house 
formats of other groups) are accepted. We will gladly 
add further possible arrays and formats on request. 

Implementation 

HomozygosityMapper was programmed in Perl. It makes 
use of a PostgreSQL 8.3 database. Web server and 
database run on an Intel Xeon platform with two 
QuadCore processors and 48 GB of RAM under Fedora 
Core Linux. A thorough description of the implementa- 
tion can be found on the website. 

The website was developed with and optimised for 
Mozilla Firefox 2-10. It was successfully tested with 
Firefox 2-10 (under different versions of Linux, 
Microsoft Windows and MacOS) and Microsoft Internet 
Explorer 6, 7 and 8. 

Future plans 

We are permanently improving and extending 
HomozygosityMapper. The next milestone will be the 
tight integration of MutationTaster (20), our web-based 
tool to predict the disease potential of DNA alterations, 
into the analysis of deep sequencing genotypes. With a 
future interface, users will be able to retrieve a list of all 
homozygous variants that are located in possible disease 
regions and have a high disease potential. We will add new 
species, new genotyping assays and support for new file 
formats upon request on short notice. 

CONCLUSION 

We have presented the novel version of 
HomozygosityMapper, a web-based application aimed at 
homozygosity mapping of SNP genotypes and NGS data 
in different species. HomozygosityMapper is freely 
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accessible at http://www.homozygositymapper.org/ and 
there is no login requirement. We provide a step-by-step 
tutorial and a detailed documentation on our website. 
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